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1 Similarity queries I: Efficient similarity search and classification via rank aggregation 
Ronald Fagin, Ravi Kumar, D. Sivakumar 

June 2003 Proceedings of the 2003 ACM SIGMOD international conference on 
Management of data 

Publisher: ACM Press 

Full text available: 1to(l9M§_KB) Additional information: full citation , abstract, references, dtings, index 
^ terms 

We propose a novel approach to performing efficient similarity search and classification in 
high dimensional data. In this framework, the database elements are vectors in a 
Euclidean space. Given a query vector in the same space, the goal is to find elements of 
the database that are similar to the query. In our approach, a small number of 
independent "voters" rank the database elements based on similarity to the query. These 
rankings are then combined by a highly efficient aggregation algorithm. ... 

2 Link-based rankin g 2: Searching the workplace web j 
Ronald Fagin, Ravi Kumar, Kevin S. McCurley, Jasmine Novak, D. Sivakumar, John A. Tomlin, 
David P. Williamson 

May 2003 Proceedings of the 12th international conference on World Wide Web 
Publisher: ACM Press 

Full text available* IS pdf (231.55 KB) Adcli tional Information: full citation , abstract , references , citin gs, index 
' " " terms 

The social impact from the World Wide Web cannot be underestimated, but technologies 
used to build the Web are also revolutionizing the sharing of business and government 
information within intranets. In many ways the lessons learned from the Internet carry 
over directly to intranets, but others do not apply. In particular, the social forces that 
guide the development of intranets are quite different, and the determination of a "good 
answer" for intranet search is quite different than on the Int ... 



Query evaluation techniques for large databases 
Goetz Graefe 

June 1993 ACM Computing Surveys (CSUR), volume 25 issue 2 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citin gs, index 
terms , review 



Full text available: pdf(9.37 MB) 



Database management systems will continue to manage large data volumes. Thus, 
efficient algorithms for accessing and manipulating large sets and sequences will be 
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required to provide acceptable performance. The advent of object-oriented and extensible 
database systems will not solve this problem. On the contrary, modern data models 
exacerbate the problem: In order to manipulate large sets of complex objects as 
efficiently as today's database systems manipulate simple records, query-processi ... 

Keywords: complex query evaluation plans, dynamic query evaluation plans, extensible 
database systems, iterators, object-oriented database systems, operator model of 
parallelization, parallel algorithms, relational database systems, set-matching algorithms, 
sort-hash duality 



4 Industrial and practical experience track pa per session 1 : The volume and evolution j| 
of web page templates 
David Gibson, Kunal Punera, Andrew Tomkins 

May 2005 Special interest tracks and posters of the 14th international conference on 
World Wide Web 

Publisher: ACM Press 

Full text available: ^ pdf(249.32 KB) Additional Information: full citation , abstract , references , index terms 

Web pages contain a combination of unique content and template material, which is 
present across multiple pages and used primarily for formatting, navigation, and 
branding. We study the nature, evolution, and prevalence of these templates on the web. 
As part of this work, we develop new randomized algorithms for template extraction that 
perform approximately twenty times faster than existing approaches with similar quality. 
Our results show that 40-50% of the content on the web is templa ... 

Keywords: algorithms, boilerplate, data cleaning, data mining, templates, web mining 




5 Fast detection of communication patterns in distributed executions 
Thomas Kunz, Michiel F. H. Seuren 

November 1997 Proceedings of the 1997 conference of the Centre for Advanced 
Studies on Collaborative research 

Publisher: IBM Press 

Full text available: |j| pdf(4.21 MB ) Additional Information: full citation , abstract , references , index terms 

Understanding distributed applications is a tedious and difficult task. Visualizations based 
on process-time diagrams are often used to obtain a better understanding of the 
execution of the application. The visualization tool we use is Poet, an event tracer 
developed at the University of Waterloo. However, these diagrams are often very complex 
and do not provide the user with the desired overview of the application. In our 
experience, such tools display repeated occurrences of non-trivial commun ... 

6 Searchin g the Web | 
August 2001 ACM Transactions on Internet Technology (TOIT), Volume 1 Issue 1 
Publisher: ACM Press 

Full text available* fiE|pdf(319 98 KB) Addit ' ona ' Information: full citation , abstract , references , citings , index 

terms , review 

We offer an overview of current Web search engine design. After introducing a generic 
search engine architecture, we examine each engine component in turn. We cover 
crawling, local Web page storage, indexing, and the use of link analysis for boosting 
search performance. The most common design and implementation techniques for each of 
these components are presented. For this presentation we draw from the literature and 
from our own experimental search engine testbed. Emphasis is on introduci ... 

Keywords: HITS, PageRank, authorities, crawling, indexing, information retrieval, link 
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analysis, search engine 

7 Querying web metadata: Native score management and text support in databases 
Gultekin OzsoyoDlu, Ismail Sengor Altingovde, Abdullah Al-Hamdani, Selma Ay§e Ozel, 
Ozgur Ulusoy, Zehra Meral ozsoyoDlu 

December 2004 ACM Transactions on Database Systems (TODS), volume 29 issue 4 
Publisher: ACM Press 

Full text available: l f|j| pdf(737.76 KB) Additional Information: full citation , abstract , references , index terms 

In this article, we discuss the issues involved in adding a native score management 
system to object-relational databases, to be used in querying Web metadata (that 
describes the semantic content of Web resources). The Web metadata model is based on 
topics (representing entities), relationships among topics (called metalinks), and 
importance scores (sideway values) of topics and metalinks. We extend database 
relations with scoring functions and importance scores. We add to SQL score-manag ... 

Keywords: Score management for Web applications 



Information retrieval on the web 
Mei Kobayashi, Koichi Takeda 

June 2000 ACM Computing Surveys (CSUR), volume 32 issue 2 
Publisher: ACM Press 

Full text available' 1%| pdf (213.89 KB) Additional Information: full citation , abstract , references , citin gs, index 

terms 

In this paper we review studies of the growth of the Internet and technologies that are 
useful for information search and retrieval on the Web. We present data on the Internet 
from several different sources, e.g., current as well as projected number of users, hosts, 
and Web sites. Although numerical figures vary, overall trends cited by the sources are 
consistent and point to exponential growth in the past and in the coming decade. Hence it 
is not surprising that about 85% of Internet user ... 

Keywords: Internet, World Wide Web, clustering, indexing, information retrieval, 
knowledge management, search engine 



9 Service discovery: Discovering and ranking web services with BASIL: a personalized jj| 
a pproach with biased focus 
James Caverlee, Ling Liu, Daniel Rocco 

November 2004 Proceedings of the 2nd international conference on Service oriented 
computing 

Publisher: ACM Press 

Full text available: ^ pdf(283.05 KB) Additional Information: full citation , abstract , references , index terms 

In this paper we present a personalized web service discovery and ranking technique for 
discovering and ranking relevant data-intensive web services. Our first prototype — called 
BASIL - supports a <i>personalized</i> view of data-intensive web services through 
source-biased focus. BASIL provides service discovery and ranking through source-biased 
probing and source-biased relevance metrics. Concretely, the BASIL approach has three 
unique features: (1) It is able to determine in ver ... 

Keywords: biased discovery, data-intensive services, ranking 




10 



http://portal.acm.org/results.cfm?CFTO=9244&CFrOKEN=76968792 6/21/06 



Results (page 1): ^catalog +multiple +sites +ranking +aggregating Page 4 of 7 



Link-based ranking 2: A new paradigm for ranking pages on the world wide web 
John A. Tomlin 

May 2003 Proceedings of the 12th international conference on World Wide Web 
Publisher: ACM Press 

Full text available*^ pdf(1 12 28 KB) Additional Information: full citation , abstract , references , citin gs, index 

terms 

This paper describes a new paradigm for modeling traffic levels on the world wide web 
(WWW) using a method of entropy maximization. This traffic is subject to the 
conservation conditions of a circulation flow in the entire WWW, an aggregation of the 
WWW, or a subgraph of the WWW (such as an intranet or extranet). We specifically apply 
the primal and dual solutions of this model to the (static) ranking of web sites. The first of 
these uses an imputed measure of total traffic through a web page, t ... 

Keywords: entropy, optimization, search engines, static ranking 



11 Discovering critical edge sequences in E-commerce catalog s 
Kaushik Dutta, Debra VanderMeer, Anindya Datta, Krithi Ramamritham 
October 2001 Proceedings of the 3rd ACM conference on Electronic Commerce 
Publisher: ACM Press 

Full text available: pdf(270.53 KB) Additional Information: full citation , abstract , references , index terms 

Web sites allow the collection of vast amounts of navigational data — clickstreams of user 
traversals through the site. These massive data stores offer the tantalizing possibility of 
uncovering interesting patterns within the dataset. For e-businesses, always looking for 
an edge in the hyper-competitive online marketplace, this possibility is of particular 
interest. Of significant particular interest to e-businesses is the discovery of Critical Edge 
Sequences (CES), which denote f ... 

Keywords: accuracy, approximation, critical edge sequence, web site performance, web 
usage analysis 





12 Efficiently serving d ynamic data at highl y accessed web sites 

James R. Challenger, Paul Dantzig, Arun Iyengar, Mark S. Squillante, Li Zhang 
April 2004 IEEE/ACM Transactions on Networking (TON), volume 12 issue 2 
Publisher: IEEE Press 

Full text available: ^ pdf(499.05 KB) Additional Information: full citation , abstract , references , index terms 

We present architectures and algorithms for efficiently serving dynamic data at highly 
accessed Web sites together with the results of an analysis motivating our design and 
quantifying its performance benefits. This includes algorithms for keeping cached data 
consistent so that dynamic pages can be cached at the Web server and dynamic content 
can be served at the performance level of static content. We show that our system design 
is able to achieve cache hit ratios close to 100% for cached data ... 

Keywords: caching, dynamic content, performance analysis, prefetching, stochastic 
models, web sites 



13 GLARE: A Grid Activity Registration, Deployment and Provisioning Framework 
Mumtaz Siddiqui, Alex Villazon, Jurgen Hofer, Thomas Fahringer 

November 2005 Proceedings of the 2005 ACM/IEEE conference on Supercomputing SC 
'05 

Publisher: IEEE Computer Society 
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Full text available: « . £/A ran Additional Information: full citation , abstract 
^B]pdf(1.43 MB)^ 

Publisher Site 

Resource management is a key concern for implementing effective Grid middleware and 
shielding application developers from low level details. Existing resource managers 
concentrate mostly on physical resources. However, some advanced Grid programming 
environments allow application developers to specify Grid application components at high 
level of abstraction which then requires an effective mapping between high level 
application description (activity types) and actual deployed software components ... 



14 Image Retrieval from the World Wide Web: Issues. Techniques, and Systems 




M. L. Kherfi, D. Ziou, A. Bernardi 

March 2004 ACM Computing Surveys (CSUR), Volume 36 issue l 
Publisher: ACM Press 

Full text available: ^ pdf(294.13 KB) Additional Information: full citation , abstract , references , index terms 

With the explosive growth of the World Wide Web, the public is gaining access to massive 
amounts of information. However, locating needed and relevant information remains a 
difficult task, whether the information is textual or visual. Text search engines have 
existed for some years now and have achieved a certain degree of success. However, 
despite the large number of images available on the Web, image search engines are still 
rare. In this article, we show that in order to allow people to profi ... 

Keywords: Image-retrieval, World Wide Web, crawling, feature extraction and selection, 
indexing, relevance feedback, search, similarity 



15 An adaptive real-time Web search engine 
Augustine Chidi Ikeji, Farshad Fotouhi 

>^ November 1999 Proceedings of the 2nd international workshop on Web information 
and data management 
Publisher: ACM Press 

Full text available:^ pdf(81 3.68 KB) Additional Information: full citation , abstract , references , index terms 

The Internet provides a wealth of information scattered all over the world. The fact that 
the information may be located anywhere makes it both convenient for placing 
information on the Web and difficult for others to find. Conventional search engines can 
only locate information that is in their search index and users do not have much choice in 
limiting or expanding the search parameters. Some web pages like those for news 
services change frequently and will not work well with index based s ... 

16 The state of the art in distributed query processin g 
Donald Kossmann 

December 2000 ACM Computing Surveys (CSUR), volume 32 issue 4 
Publisher: ACM Press 

Full text available* IS pdf(455 39 KB) Additional Information: full citation , abstract, references , citings, index 
j/y terms 

Distributed data processing is becoming a reality. Businesses want to do it for many 
reasons, and they often must do it in order to stay competitive. While much of the 
infrastructure for distributed data processing is already there (e.g., modern network 
technology), a number of issues make distributed data processing still a complex 
undertaking: (1) distributed systems can become very large, involving thousands of 
heterogeneous sites including PCs and mainframe server machines; (2) the stat ... 

Keywords: caching, client-server databases, database application systems, 
dissemination-based information systems, economic models for query processing, 
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middleware, multitier architectures, query execution, query optimization, replication, 
wrappers 



17 Plenary papers: Learning the semantics of multimedia queries and concepts from a j| 
small number of examples 

Apostol (Paul) Natsev, Milind R. Naphade, Jelena TesiC 

November 2005 Proceedings of the 13th annual ACM international conference on 
Multimedia MULTIMEDIA a 05 

Publisher: ACM Press 

Full text available:^ pdf(533.74 KB) Additional Information: full citation , abstract , references , index terms 

In this paper we unify two supposedly distinct tasks in multimedia retrieval. One task 
involves answering queries with a few examples. The other involves learning models for 
semantic concepts, also with a few examples. In our view these two tasks are identical 
with the only differentiation being the number of examples that are available for training. 
Once we adopt this unified view, we then apply identical techniques for solving both 
problems and evaluate the performance using the NISTTRECVID b ... 

Keywords: MECBR, TRECVID, semantics, support vector machines 




18 Business models and market mechanisms: evaluating efficiencies in consumer 
electronic markets 
Jonathan Palmer, Markus Lindemann 
June 2003 ACM SIGMIS Database, volume 34 issue 2 
Publisher: ACM Press 

Full text available: pdf(287.67 KB) Additional Information: full citation , abstract , references 

The paper examines business models utilizing three different market mechanisms on the 
Internet: direct search, broker, and dealer. Utilizing capital markets and information 
theory to compare the business models, the research looks at specific market 
mechanisms instantiated in PriceScan, IMetMarket, and Bottom Dollar. The web sites 
supporting the market structures were also evaluated on trust mechanisms, reputational 
ratings, information quality, availability, speed, and liquidity. Twenty standard ... 

Keywords: efficiency, electronic markets, market structure, world wide web 




19 S pecial issue: Al in en g ineering 
D. Sriram, R. Joobbani 

V April 1985 ACM SIGART Bulletin, issue 92 

Publisher: ACM Press 

Full text available:^) pdf(8.79 MB) Additional Information: full citation , abstract 

The papers in this special issue were compiled from responses to the announcement in 
the July 1984 issue of the SIGART newsletter and notices posted over the ARPAnet. The 
interest being shown in this area is reflected in the sixty papers received from over six 
countries. About half the papers were received over the computer network. 

20 NSDL: Core services in the architecture of the national science di g ital library (NSDL ) ^ 
Carl Lagoze, William Arms, Stoney Gan, Diane Hillmann, Christopher Ingram, Dean Krafft, 
Richard Marisa, Jon Phipps, John Saylor, Carol Terrizzi, Walter Hoehn, David Millman, James 
Allan, Sergio Guzman-Lara, Tom Kalt 

July 2002 Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries 
Publisher: ACM Press 
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Full text available: ^ pdf(940.66 KB) Additional Information: full citation , abstract , references , citings , index 

terms 

We describe the core components of the architecture for the National Science Digital 
Library (NSDL). Over time the NSDL will include heterogeneous users, content, and 
services. To accommodate this, a design for a technical and organization infrastructure 
has been formulated based on the notion of a spectrum of interoperability. This paper 
describes the first phase of the interoperability infrastructure including the metadata 
repository, search and discovery services, rights management services, ... 

Keywords: architecture, interoperability, metadata, testbeds 
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21 TinyDB: an acquisitional query processing system for sensor networks 
Samuel R. Madden, Michael J. Franklin, Joseph M. Hellerstein, Wei Hong 
March 2005 ACM Transactions on Database Systems (TODS), volume 30 issue l 
Publisher: ACM Press 

Full text available: ^ pdf(1.67 MB) Additional Information: full citation , abstract , references , index terms 

We discuss the design of an acquisitional query processor for data collection in sensor 
networks. Acquisitional issues are those that pertain to where, when, and how often data 
is physically acquired (sampled) and delivered to query processing operators. By focusing 
on the locations and costs of acquiring data, we are able to significantly reduce power 
consumption over traditional passive systems that assume the a priori existence of data. 
We discuss simple extensions to SQL for controlli ... 

Keywords: Query processing, data acquisition, sensor networks 



22 Automatic personalization based on Web usa g e minin g 
Bamshad Mobasher, Robert Cooley, Jaideep Srivastava 
August 2000 Communications of the ACM, Volume 43 Issue 8 
Publisher: ACM Press 

Full text available: ^ pdf(2.62 MB) g) Addjtiona| , nformation: fu || citation , references , citings , index terms 
html (49.24 KB) 



2Z Building and usin g cultural digital libraries: Primarily history: historians and the search 
^ for primary source materials 
^ Helen R. Tibbo 

July 2002 Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries 
Publisher: ACM Press 

Full text available* fij?| pdf(225.16 KB) A ^ ditional Information: full citation , abstract , references , citings , index 

terms 

This paper describes the first phase of an international project that is exploring how 
historians locate primary resource materials in the digital age, what they are teaching 
their Ph.D. students about finding research materials, and what archivists are doing to 
facilitate access to these materials. Preliminary findings are presented from a survey of 
300 historians studying American History from leading institutions of higher education in 
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the U.S. Tentative conclusions indicate the need to provi ... 

Keywords: archives, historians, historical research, information-seeking behavior, 
manuscript repositories, primary resources, users 



24 Adaptive performance prediction for distributed data-intensive applications 
Marcio Faerman, Alan Su, Richard Wolski, Francine Berman 

January 1999 Proceedings of the 1999 ACM/IEEE conference on Supercomputing 
(CDROM) 

Publisher: ACM Press 

Full text available: ^ pdf(292.25 KB) Additional Information: full citation , references , citings , index terms 




25 STARTS: Stanford proposal for Internet meta-searchin g 
Luis Gravano, Chen-Chuan K. Chang, Hector Garda-Molina, Andreas Paepcke 
June 1997 ACM SIGMOD Record , Proceedings of the 1997 ACM SIGMOD international 

conference on Management of data SIGMOD '97, volume 26 issue 2 
Publisher: ACM Press 

Full text available- fiQ pdf(1 53 MB) Additional Information: full citation , abstract , references , citings, index 
' ^ terms 

Document sources are available everywhere, both within the internal networks of 
organizations and on the Internet. Even individual organizations use search engines from 
different vendors to index their internal document collections. These search engines are 
typically incompatible in that they support different query models and interfaces, they do 
not return enough information with the query results for adequate merging of the results, 
and finally, in that they do not export metadata about t ... 

26 Di gital libraries for spatial data: The ADEPT digital library architecture 
Greg Janee, James Frew 

July 2002 Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries 
Publisher: ACM Press 

Full text available* l filpdf(263 61 KB) Additional Information: full citation , abstract , references , citin gs, index 
* " terms 

The Alexandria Digital Earth ProtoType (ADEPT) architecture is a framework for building 
distributed digital libraries of georeferenced information. An ADEPT system comprises one 
or more autonomous libraries, each of which provides a uniform interface to one or more 
collections, each of which manages metadata for one or more items. The primary 
standard on which the architecture is based is the ADEPT bucket framework, which 
defines uniform client-level metadata query services that are compatible w ... 

Keywords: bucket framework, collection discovery, distribution, interoperability, 
metadata 



27 Information systems outsourcing: a survey and analysis of the literature 
Jens Dibbern, Tim Goles, Rudy Hirschheim, Bandula Jayatilaka 
November 2004 ACM SIGMIS Database, volume 35 issue 4 
Publisher: ACM Press 

Full text available: ^ pdf(1.51 MB) Additional Information: full citation , abstract , references 

In the last fifteen years, academic research on information systems (IS) outsourcing has 
evolved rapidly. Indeed the field of outsourcing research has grown so fast that there has 
been scant opportunity for the research community to take a collective breath, and 
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complete a global assessment of research activities to date. This paper seeks to address 
this need by exploring and synthesizing the academic literature on IS outsourcing. It 
offers a roadmap of the IS outsourcing literature, highligh ... 

Keywords: determinants, literature review, outcomes, outsourcing, relationships, 
research approaches, theoretical foundations 



28 Recommender systems in e-commerce 
^ J. Ben Schafer, Joseph Konstan, John Riedi 

November 1999 Proceedings of the 1st ACM conference on Electronic commerce 

Publisher: ACM Press 

Full text available: ^p] pdf(1 12.96 KB) Additional Information: full citation , references , citin gs, index terms 



Keywords: cross-sell, customer loyalty, electronic commerce, interface, mass 
customization, recommender systems, up-sell 



29 An Internet-based negotiation server for e-commerce 

Stanley Y.W. Su, Chunbo Huang, Joachim Hammer, Yihua Huang, Haifei Li, Liu Wang, 

Youzhong Liu, Charnyote Pluempitiwiriyawej, Minsoo Lee, Herman Lam 

August 2001 The VLDB Journal — The International Journal on Very Large Data 

Bases, Volume lO Issue 1 
Publisher: Springer-Verlag New York, Inc. 

Full text available: ^| pdf (355.19 KB) Additional Information: full citation , abstract , citings, index terms 

This paper describes the design and implementation of a replicable, Internet-based 
negotiation server for conducting bargaining-type negotiations between enterprises 
involved in e-commerce and e-business. Enterprises can be buyers and sellers of 
products/services or participants of a complex supply chain engaged in purchasing, 
planning, and scheduling. Multiple copies of our server can be installed to complement the 
services of Web servers. Each enterprise can install or select a trusted negotia ... 

Keywords: Constraint evaluation, Cost- benefit analysis, Database, E-commerce, 
Negotiation policy and strategy, Negotiation protocol 



30 Quantifying and computing with structure: As we ma y perceive: inferrin g logical 




documents from hypertext 

Pavel Dmitriev, Carl Lagoze, Boris Suchkov 

September 2005 Proceedings of the sixteenth ACM conference on Hypertext and 
hypermedia HYPERTEXT '05 

Publisher: ACM Press 

Full text available: pdf( 670.50 KB) Additional Information: full citation , abstract , references , index terms 

In recent years, many algorithms for the Web have been developed that work with 
information units distinct from individual web pages. These include segments of web 
pages or aggregation of web pages into web communities. Such logical information units 
improve a variety of web algorithms and provide the building blocks for the construction 
of organized information spaces such as digital libraries. In this paper, we focus on a type 
of logical information units called "compound documents". We argue ... 

Keywords: WWW, clustering, compound documents 
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31 Fig hting search spam: Detecting spam web pages through content analysis 
Alexandros Ntoulas, Marc Najork, Mark Manasse, Dennis Fetterly 
May 2006 Proceedings of the 15th international conference on World Wide Web 

WWW '06 
Publisher: ACM Press 

Full text available:^ pdf(259.06 KB) Additional Information: full citation , abstract , references , index terms 

In this paper, we continue our investigations of "web spam": the injection of artificially- 
created pages into the web in order to influence the results from search engines, to drive 
traffic to certain pages for fun or profit. This paper considers some previously-undescribed 
techniques for automatically detecting spam pages, examines the effectiveness of these 
techniques in isolation and when aggregated using classification algorithms. When 
combined, our heuristics correctly identify 2,037 (86.2% ... 

Keywords: data mining, web characterization, web pages, web spam 



32 NSF workshop on industrial/academic cooperation in database systems 
Mike Carey, Len Seligman 

March 1999 ACM SIGMOD Record, volume 28 issue l 
Publisher: ACM Press 

Full text available: ^ pdf(1.96 MB) Additional Information: full citation , index terms 




33 Designing and mining multi-terabyte astronomy archives: the Sloan Digital Sky 
Survey 

^ Alexander S. Szalay, Peter Z. Kunszt, Ani Thakar, Jim Gray, Don Slutz, Robert J. Brunner 
May 2000 ACM SIGMOD Record , Proceedings of the 2000 ACM SIGMOD international 

conference on Management of data SIGMOD '00, volume 29 issue 2 
Publisher: ACM Press 

Full text available* 1?l pdf(429 09 KB) Additional Information: full citation , abstract , references , citing s, index 
. (a) _ terms 

The next-generation astronomy digital archives will cover most of the sky at fine 
resolution in many wavelengths, from X-rays, through ultraviolet, optical, and infrared. 
The archives will be stored at diverse geographical locations. One of the first of these 
projects, the Sloan Digital Sky Survey (SDSS) is creating a 5-wavelength catalog over 
10,000 square degrees of the sky (see http://www.sdss.org/). The 200 million objects in 
the multi-terabyte database will have mostly numerical attribut ... 

Keywords: Internet, archive, astronomy, data analysis, data mining, database, scalable 



34 Gatherin g at the well: creating communities for grid I/O 

Douglas Thain, John Bent, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny 
November 2001 Proceedings of the 2001 ACM/IEEE conference on Supercomputing 

(CDROM) 
Publisher: ACM Press 

Full text available* IS odff 139 41 KB) Additional Information: full citation , abstract , references , citings, index 
terms 

Grid applications have demanding I/O needs. Schedulers must bring jobs and data in 
close proximity in order to satisfy throughput, scalability, and policy requirements. Most 
systems accomplish this by making either jobs or data mobile. We propose a system that 
allows jobs and data to meet by binding execution and storage sites together into I/O 
communities which then participate in the wide-area system. The relationships between 
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participants in a community may be expressed by the ClassAd ... 

35 SenseMaker: an information-exploration interface supporting the contextual evolution Q 
of a user's interests 

Michelle Q. Wang Baldonado, Terry Winograd 

March 1997 Proceedings of the SIGCHI conference on Human factors in computing 

systems 
Publisher: ACM Press 

Full text available: ^ pdf(1.02 MB) Additional Information: full citation , references , citing s, index terms 



Keywords: digital libraries, information exploration, information retrieval, information 
seeking 



36 Digital libraries, value, and productivity | 
Gio Wiederhold 

April 1995 Communications of the ACM, Volume 38 Issue 4 
Publisher: ACM Press 

Full text available* 1Sl Ddf(292 07 KB) Additional Information: full citation , abstract, references , citings, index 
•TS^-™ 1 : terms 

A digital library is popularly viewed an electronic version of a public library. But replacing 
paper by electronic storage leads to three major differences: storage in digital form, direct 
communication to obtain material, and copying from a master version. These differences 
in turn lead to a plethora of further differences, so that eventually the digital library no 
longer mimics the traditional library. Furthermore, a library is only element in the process 
of creating, storing, culling, ac ... 

37 Streams , structures, spaces, scenarios, societies (5s): A formal model for digital | 
libraries 

Marcos Andre Gongalves, Edward A. Fox, Layne T. Watson, Neill A. Kipp 
April 2004 ACM Transactions on Information Systems (TOIS), volume 22 issue 2 
Publisher: ACM Press 

Full text available* *@ pdf (316.85 KB) Additional Information: full citation , abstract , references , citing s, index 
• • terms , review 

Digital libraries (DLs) are complex information systems and therefore demand formal 
foundations lest development efforts diverge and interoperability suffers. In this article, 
we propose the fundamental abstractions of Streams, Structures, Spaces, Scenarios, and 
Societies (5S), which allow us to define digital libraries rigorously and usefully. Streams 
are sequences of arbitrary items used to describe both static and dynamic (e.g., video) 
content. Structures can be viewed as labeled directed gra ... 

Keywords: applications., definitions, foundations, taxonomy 



38 H yPursuit: a hierarchical network search engine that exploits content-link h y pertext Q 
clustering 

Ron Weiss, Bienvenido Velez, Mark A. Sheldon 

March 1996 Proceedings of the the seventh ACM conference on Hypertext 
Publisher: ACM Press 

Full text available: ^pdf(2.Q0 MB) Additional Information: full citation , references , citings , index terms 
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39 Research track papers: Web usage mining based on probabilistic latent semantic 
analy sis 

Xin Jin, Yanzan Zhou, Bamshad Mobasher 

August 2004 Proceedings of the tenth ACM SIGKDD international conference on 
Knowledge discovery and data mining KDD '04 

Publisher: ACM Press 

Full text available* 13 odf(747 27 KB) Ac,ditional Information: full citation , abstract , references , citings , index 
' ^ '~ terms 

The primary goal of Web usage mining is the discovery of patterns in the navigational 
behavior of Web users. Standard approaches, such as clustering of user sessions and 
discovering association rules or frequent navigational paths, do not generally provide the 
ability to automatically characterize or quantify the unobservable factors that lead to 
common navigational patterns. It is, therefore, necessary to develop techniques that can 
automatically discover hidden semantic relationships among use ... 

Keywords: PLSA, Web usage mining, user profiling 



40 Q focus: semi-structured data: Why your data won't mix 
^ Alon Halevy 

>^ October 2005 Queue, volume 3 issue 8 
Publisher: ACM Press 

Full text available: fg] pdf(442.63 KB) AJJ . A . lir . r 

n£ „^ l/n Additional Information: full citation , abstract , r eferences , index terms 
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New tools and techniques can help ease the pain of reconciling schemas. 
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41 Mining multimedia data 

Osmar R. ZaTane, Jiawei Han, Ze-IMian Li, Jean Hou 

November 1998 Proceedings of the 1998 conference of the Centre for Advanced 
Studies on Collaborative research 

Publisher: IBM Press 

Full text available* 111 pdf (377.84 KB) Additional Information: full citation , abstract , references , citings, index 
. i/y terms 

Data Mining is a young but flourishing field. Many algorithms and applications exist to 
mine different types of data and extract different types of knowledge. Mining multimedia 
data is, however, at an experimental stage. We have implemented a prototype for mining 
high-level multimedia information and knowledge from large multimedia databases. 
MultiMedia Miner has been designed based on our years of experience in the research and 
development of a relational data mining system, DBMiner, in the Inte ... 

Keywords: data cube, data mining, data warehousing, image analysis, information 
retrieval, multimedia, world-wide web 



42 Queue Focus: Search: Enterprise Search: Tough Stuff 
Rajat Mukherjee, Jianchang Mao 
April 2004 Queue, volume 2 issue 2 
Publisher: ACM Press 

Full text available:^ Pdf(2.29 MB) ...... ,., ... . . , 

rr? t. * „„ Ar - ,/nv Additional Information: full citation , citings, index terms 
\#\ html(37.45 KB) 



43 Research track: Cross-training: learning probabilistic mappings between topics 

Sunita Sarawagi, Soumen Chakrabarti, Shantanu Godbole 
V August 2003 Proceedings of the ninth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Publisher: ACM Press 

Full text available: -SI pdf(331.75 KB) Additional Information: full citation , abstract, references , citjngs, index 

terms 

Classification is a well-established operation in text mining. Given a set of labels A and a 
set D A of training documents tagged with these labels, a classifier learns to assign labels 
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to unlabeled test documents. Suppose we also had available a different set of labels B, 
together with a set of documents D B marked with labels from B. If A and B have some 

semantic overlap, can the availability of D B help us b ... 

Keywords: EM, document classification, semi-supervised multi-task learning, support 
vector machines 



44 Extensible query processing in starburst Q 
L. M. Haas, J. C. Freytag, G. M. Lohman, H. Pirahesh 

June 1989 ACM SIGMOD Record , Proceedings of the 1989 ACM SIGMOD international 

conference on Management of data SIGMOD '89, volume 18 issue 2 
Publisher: ACM Press 

Full text available: j^pdf d 63 MB) Additional Information: full citation , abstract, references , citing s, index 

terms 

Today's DBMSs are unable to support the increasing demands of the various applications 
that would like to use a DBMS. Each kind of application poses new requirements for the 
DBMS. The Starburst project at IBM's Almaden Research Center aims to extend relational 
DBMS technology to bridge this gap between applications and the DBMS. While providing 
a full function relational system to enable sharing across applications, Starburst will also 
allow (sophisticated) programmers to add many kinds of ... 

45 S pecial section on grid computin g : Experiences with predicting resource performance jjjjj 
on-line in computational grid setting s 
Rich Wolski 

March 2003 ACM SIG METRICS Performance Evaluation Review, volume 30 issue 4 
Publisher: ACM Press 

Full text available: ^£| pdf(1.Q7 MB) Additional Information: full citation , abstract , references 

In this paper, we describe methods for predicting the performance of Computational Grid 
resources (machines, networks, storage systems, etc.) using computationally inexpensive 
statistical techniques. The predictions generated in this manner are intended to support 
adaptive application scheduling in Grid settings, and on-line fault detection. Wedescribe a 
mixture-of-experts approach to non-parametric, univariate time-series forecasting, and 
detail the effectiveness of the approach using example d ... 

46 The role of intermediary services in emerg in g di g ital libraries Q 
Allen Brewer, Wei Ding, Karla Hahn, Anita Komlodi 

April 1996 Proceedings of the first ACM international conference on Digital libraries 
Publisher: ACM Press 

Full text available: ^| pdf(860.18 KB) Additional Information: full citation , references , citings , index terms 



47 Research sessions: query uncertainty: Automatic categorization of query results 

#Kaushik Chakrabarti, Surajit Chaudhuri, Seung-won Hwang 
June 2004 Proceedings of the 2004 ACM SIGMOD international conference on 

Management of data 
Publisher: ACM Press 

Full text available: ^ pdf(236.05 KB) Additional Information: full citation , abstract , references 

Exploratory ad-hoc queries could return too many answers - a phenomenon commonly 
referred to as "information overload". In this paper, we propose to automatically 
categorize the results of SQL queries to address this problem. We dynamically generate a 
labeled, hierarchical category structure - users can determine whether a category is 
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relevant or not by examining simply its label; she can then explore just the relevant 
categories and ignore the remaining ones, thereby reducing information overlo ... 

48 O ptimization of query streams using semantic prefetching 
Ivan T. Bowman, Kenneth Salem 

December 2005 ACM Transactions on Database Systems (TODS), volume 30 issue 4 
Publisher: ACM Press 

Full text available: ^pdf (1.10 MB) Additional Information: full citation , abstract , references , index terms 

Streams of relational queries submitted by client applications to database servers contain 
patterns that can be used to predict future requests. We present the Scalpel system, 
which detects these patterns and optimizes request streams using context-based 
predictions of future requests. Scalpel uses its predictions to provide a form of semantic 
prefetching, which involves combining a predicted series of requests into a single request 
that can be issued immediately. Scalpel's semantic prefetching ... 

Keywords: Prefetching, query streams 




49 Learning classifiers: Using urls and table layout for web classification tasks 
L K. Shih, D. R. Karger 

May 2004 Proceedings of the 13th international conference on World Wide Web 

Publisher: ACM Press 

Full text available: ^ pdf(357.43 KB) Additional Information: full citation , abstract , references , index terms 

We propose new features and algorithms for automating Web-page classification tasks 
such as content recommendation and ad blocking. We show that the automated 
classification of Web pages can be much improved if, instead of looking at their textual 
content, we consider each links's URL and the visual placement of those links on a 
referring page. These features are unusual: rather than being scalar measurements like 
word counts they are tree structured—describing the position of the item ... 

Keywords: classification, news recommendation, tree structures, web applications 




50 Selected IR-Related Dissertation Abstracts Q 
February 1992 ACM SIGIR Forum, volume 26 issue l 
Publisher: ACM Press 

Full text available: * g| pdf(2.24 MB) Additional Information: full citation 



51 Selected IR-Related Dissertation Abstracts Q 
March 1993 ACM SIGIR Forum, volume 27 issue l 
Publisher: ACM Press 

Full text available: ^pdf(2.24 MB) Additional Information: full citation , abstract 

The following are citations selected by title and abstract as being related to Information 
Retrieval (IR), resulting from a computer search, using BRS Information Technologies, of 
the Dissertation Abstracts Online database produced by University Microfilms International 
(UMI). Included are UMI order number, title, author, degree, year, institution; number of 
pages, and abstract. Unless otherwise specified, paper or microform copies of 
dissertations may be ordered from University Microfilms Inter ... 

52 Web Clustering, filtering and applications: On improving local website search using Q 
web server traffic logs: a preliminary report 
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Qing Cui, Alex Dekhtyar 

November 2005 Proceedings of the 7th annual ACM international workshop on Web 
information and data management WIDM '05 

Publisher: ACM Press 

Full text available: ^| pdf(372.73 KB ) Additional Information: full citation , abstract , references , index terms 

In this paper we give a preliminary report on our study of the use of web server traffic 
logs to improve local search. Web server traffic logs are, typically, private to individual 
websites and as such — are unavailable to traditional web search engines conducting 
searches across multiple web sites. However, they can be used to augment search 
performed by a local search engine, restricted to a single site. Web server traffic logs, 
which we will refer to as simply logs throughout this paper, cont ... 

Keywords: Markov decision process, PageRank, local web search, probabilistic automata, 
web search, web traffic logs 



53 Web search 1: Using ODP metadata to personalize search 

Paul Alexandru Chirita, Wolfgang Nejdl, Raluca Paiu, Christian Kohlschutter 
August 2005 Proceedings of the 28th annual international ACM SIGIR conference on 

Research and development in information retrieval SIGIR '05 
Publisher: ACM Press 

Full text available: ^ pdf (310.29 KB) Additional Information: full citation , abstract , references , index terms 

The Open Directory Project is clearly one of the largest collaborative efforts to manually 
annotate web pages. This effort involves over 65,000 editors and resulted in metadata 
specifying topic and importance for more than 4 million web pages. Still, given that this 
number is just about 0.05 percent of the Web pages indexed by Google, is this effort 
enough to make a difference? In this paper we discuss how these metadata can be 
exploited to achieve high quality personalized web search. First, we ... 

Keywords: biased pageRank, metadata, open directory, personalized search 



54 Model independent assertions for integration of hetero g eneous schemas | 
Stefano Spaccapietra, Christine Parent, Yann Dupont 

July 1992 The VLDB Journal — The International Journal on Very Large Data Bases, 

Volume 1 Issue 1 
Publisher: Springer-Verlag New York, Inc. 

Full text available: ^| pdf(2.15 MB) Additional Information: full citation , abstract , references , citings 

Due to the proliferation of database applications, the integration of existing databases into 
a distributed or federated system is one of the major challenges in responding to 
enterprises 1 information requirements. Some proposed integration techniques aim at 
providing database administrators (DBAs) with a view definition language they can use to 
build the desired integrated schema. These techniques leave to the DBA the responsibility 
of appropriately restructuring schema elements from existing I ... 

Keywords: conceptual modeling, database design and integration, distributed databases, 
federated databases, heterogeneous databases, schema integration 



55 Research sessions: P2P and sensor networks: Efficient query reformulation in peer Q 
data management systems 
Igor Tatarinov, Alon Halevy 

June 2004 Proceedings of the 2004 ACM SIGMOD international conference on 
Management of data 
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Publisher: ACM Press 

Full text available:^ pdf(291 .76 KB) Additional Information: full citation , abstract , references , citings 

Peer data management systems (PDMS) offer a flexible architecture for decentralized data 
sharing. In a PDMS, every peer is associated with a schema that represents the peer's 
domain of interest, and semantic relationships between peers are provided locally 
between pairs (or small sets) of peers. By traversing semantic paths of mappings, a query 
over one peer can obtain relevant data from any reachable peer in the network. Semantic 
paths are traversed by reformulating queries at a peer int ... 



56 Three-dimensional object recognition 
Paul J. Besl, Ramesh C. Jain 

March 1985 ACM Computing Surveys (CSUR), volume 17 issue l 
Publisher: ACM Press 

Full text available* ff| p( jf(7.76 MB) Additional Information: full citation , abstract , references , citing s, index 

terms , review 

A general-purpose computer vision system must be capable of recognizing three- 
dimensional (3-D) objects. This paper proposes a precise definition of the 3-D object 
recognition problem, discusses basic concepts associated with this problem, and reviews 
the relevant literature. Because range images (or depth maps) are often used as sensor 
input instead of intensity images, techniques for obtaining, processing, and characterizing 
range data are also surveyed. 




57 Scalable collection summarization and selection 
R. Dolin, D. Agrawal, E. El Abbadi 

August 1999 Proceedings of the fourth ACM conference on Digital libraries 
Publisher: ACM Press 

Full text available: Q pdf(263.57 KB) Additional Information: full citation , references , citin gs, index terms 




Keywords: automated classification, metadata, resource discovery, scalability 



58 Unchained value: the new logic of digital business 
Mary J. Cronin 

February 2001 Ubiquity, volume l issue 46 
Publisher: ACM Press 

Full text available: g] html(56.83 KB) Additional Information: full citation , index terms 




59 Rule-based optimization and query processing in an extensible geometric database Q 
^ system 

^ Ludger Becker, Ralf Hartmut Guting 

June 1992 ACM Transactions on Database Systems (TODS), volume 17 issue 2 
Publisher: ACM Press 

Full text available* 13 pdf (3 35 MB) Additional Information: full citation, abstract , references , citings, index 
* La — terms , review 

Gral is an extensible database system, based on the formal concept of a many-sorted 
relational algebra. Many-sorted algebra is used to define any application's query 
language, its query execution language, and its optimiztion rules. In this paper we 
describe Gral's optimization component. It provides (1) a sophisticated rule language- 
rules are transformations of abstract algebra expressions, (2) a general optimization 
framework under which more specific optimization algorithms can be ... 
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Keywords: extensibility, geometric query processing, many-sorted algebra, optimization, 
relational algebra, rule-based optimization 



60 A survey of approaches to automatic schema matching 
Erhard Rahm, Philip A. Bernstein 

December 2001 The VLDB Journal — The International Journal on Very Large Data 

Bases, Volume 10 Issue 4 
Publisher: Springer-Verlag New York, Inc. 

Full text available: ^ pdf(1 96.22 KB) Additional Information: full citation , abstract , citing s, index terms 

Schema matching is a basic problem in many database application domains, such as data 
integration, E-business, data warehousing, and semantic query processing. In current 
implementations, schema matching is typically performed manually, which has significant 
limitations. On the other hand, previous research papers have proposed many techniques 
to achieve a partial automation of the match operation for specific application domains. 
We present a taxonomy that covers many of these existing approach ... 

Keywords: Graph matching, Machine learning, Model management, Schema integration, 
Schema matching 
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Towards the digital government of the 21 ^ century 
Herbert Schorr, Salvatore J. Stolfo 

May 2002 Proceedings of the 2002 annual national conference on Digital 
government research dg.o '02 

Publisher: Digital Government Research Center 

Full text available: ^ pdf(319.96 KB) Additional Information: full citation , abstract 

A partnership between Government agencies and the information technologies research 
community has succeeded in the past for the benefit of the Nation. The most notable 
example is the emergence of the Internet as the basis for broad scientific, cultural, civic, 
and commercial discourse, evolving from what was originally a Government-supported 
networking research project. The collaborative development of a new applied research 
domain is critical to help meet the Nation's growing information servic ... 

62 Towards the digital government of the 21^ century: a report from the workshop on 
research and development opportunities in federal information services 

Herbert Schorr, Salvatore J. Stolfo 

May 2000 Proceedings of the 2000 annual national conference on Digital 
government research dg.o '00 

Publisher: Digital Government Research Center 

Full text available: ^ pdf(339.17 KB) Additional Information: full citation , abstract 

A partnership between Government agencies and the information technologies research 
community has succeeded in the past for the benefit of the Nation. The most notable 
example is the emergence of the Internet as the basis for broad scientific, cultural, civic, 
and commercial discourse, evolving from what was originally a Government-supported 
networking research project. The collaborative development of a new applied research 
domain is critical to help meet the Nation's growing information servic ... 

63 Searching for the needle in the haystack: taxonomies, tags and targets 
^ Michael Pelikan, James Leous, Richard Pearce, Margaret E. Smith, Russell Vaught 

v October 2004 Proceedings of the 32nd annual ACM SIGUCCS conference on User 
services 
Publisher: ACM Press 

Full text available: ^ pdfd 65.65 KB) Additional Information: full citation , abstract , references , index terms 

The Penn State Taxonomic Tags group, with representatives from Information 
Technology, Business Administration, and the Penn State Libraries, was formed to 
examine whether a taxonomic set of tags, systematically applied across the university's 
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Web pages, could (a) make finding specific pages easier from among the University's 
greater than 500,000 Web pages, (b) simplify Web content management tasks and (c) 
prove useful over time as search engines continue to evolve and despite whether open 
so ... 

Keywords: content management systems, controlled vocabularies, metadata, 
taxonomies, web search engines 



64 The process group approach to reliable distributed computing 
Kenneth P. Birman 

December 1993 Communications of the ACM, volume 36 issue 12 
Publisher: ACM Press 

Full text available: ^ p df (6.00 MB ) Additional Information: full citation , references , citing s, index terms 




Keywords: fault-tolerant process groups, message ordering, multicast communication 



65 Combinatorial pattern discovery for scientific data: some preliminary results 
Jason Tsong-Li Wang, Gung-Wei Chirn, Thomas G. Marr, Bruce Shapiro, Dennis Shasha, 
Kaizhong Zhang 

May 1994 ACM SIGMOD Record , Proceedings of the 1994 ACM SIGMOD international 

conference on Management of data SIGMOD '94, volume 23 issue 2 
Publisher: ACM Press 

Full text available: ^pdf (1 04 MB) Additional Information: full citation , abstract , references , citings, index 
lAf terms 

Suppose you are given a set of natural entities (e.g., proteins, organisms, weather 
patterns, etc.) that possess some important common externally observable properties. 
You also have a structural description of the entities (e.g., sequence, topological, or 
geometrical data) and a distance metric. Combinatorial pattern discovery is the activity of 
finding patterns in the structural data that might explain these common properties based 
on the metric.This paper presents an example o ... 

66 Practical evaluation of IR within automated classification systems 
R. Dolin, J. Pierre, M. Butler, R. Avedon 

November 1999 Proceedings of the eighth international conference on Information 
and knowledge management 

Publisher: ACM Press 

Full text available: IS pdf(909.47 KB) Additional Information: full citation , abstract, references , citings, index 
* l£d : terms 

This paper describes some of the work we have done to evaluate and compare the use of 
three IR systems (Verity, LSI, and SMART) as black boxes within an automated 
classification environment. We use automated classification to make a quantitative 
comparison of the effectiveness of the systems within this context. In so doing, we also 
develop criteria for the construction of a useful training set. These results lead to metrics 
useful in the integration of IR systems into larger applications. ... 

Keywords: IR evaluation, automated classification, training sets 



67 Summary of the final report of the NSF workshop on scientific database mana gement Q 
James C. French, Anita K. Jones, John L. Pfaltz 
December 1990 ACM SIGMOD Record, volume 19 issue 4 
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Publisher: ACM Press 

Full text available: ^ pdf(679.90 KB) Additional Information: full citation , abstract , citings , index terms 

The National Science Foundation sponsored a two day workshop hosted by the University 
of Virginia on March 12-13, 1990 at which representatives from the earth, life, and space 
sciences met with computer scientists to discuss the issues facing the scientific 
community in the area of database management. The workshopl participants concluded 
that initiatives by the National Science Foundation and other funding agencies, as well as 
specific discipline professional societies ... 

68 Item-based top-A/ recommendation algorithms 
Mukund Deshpande, George Karypis 

January 2004 ACM Transactions on Information Systems (TOIS), volume 22 issue 1 
Publisher: ACM Press 

Full text available: ^ pdf(240.61 KB) Additional Information: full citation , abstract , references , index terms 

The explosive growth of the world-wide-web and the emergence of e-commerce has led to 
the development of recommender systems— a personalized information filtering 
technology used to identify a set of items that will be of interest to a certain user. User- 
based collaborative filtering is the most successful technology for building recommender 
systems to date and is extensively used in many commercial recommender systems. 
Unfortunately, the computational complexity of these methods grows I ... 

Keywords: e-commerce, predicting user behavior, world wide web 




69 Selected IR-Related Dissertatio n Abstracts Q 
May 1991 ACM SIGIR Forum, volume 25 issue 1 
Publisher: ACM Press 

Full text available: ^ pdf(2.71 MB) Additional Information: full citation , abstract 

The following are citations selected by title and abstract as being related to Information 
Retrieval (IR), resulting from a computer search, using BRS Information Technologies, of 
the Dissertation Abstracts Online database produced by University Microfilms International 
(UMI). Included are UMI order number, title, author, degree, year, institution; number of 
pages, one or more Dissertation Abstracts International (DAI) subject descriptors chosen 
by the author, and abstract. Unless otherwise spec ... 




70 Interoperability for digital libraries worldwide 

Andreas Paepcke, Chen-Chuan K. Chang, Terry Winograd, Hector Garcia-Molina 
April 1998 Communications of the ACM, volume 41 issue 4 

Publisher: ACM Press 

Full text available: ^ pdf(299.48 KB) Additional Information: full citation , references , citing s, index terms 




71 Steady-state simulation of queueing processes: survey of problems and solutions 
Krzysztof Pawlikowski 

June 1990 ACM Computing Surveys (CSUR), Volume 22 issue 2 
Publisher: ACM Press 

Full text available: IB pdf (4.75 MB) Additional Information: full citation, abstract, references , citings, index 
* 1^3- * terms 

For years computer-based stochastic simulation has been a commonly used tool in the 
performance evaluation of various systems. Unfortunately, the results of simulation 
studies quite often have little credibility, since they are presented without regard to their 
random nature and the need for proper statistical analysis of simulation output data. This 
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paper discusses the main factors that can affect the accuracy of stochastic simulations 
designed to give insight into the steady-st ... 

72 Current practices of leading e-government countries 
Sang M. Lee, Xin Tan, Silvana Trimi 

October 2005 Communications of the ACM, volume 48 issue 10 
Publisher: ACM Press 

Full text available:^ pdf(129.32 KB) AJJ . 4 . 11£ , L £ 

t_ * -™ Additional Information: full citation , abstract , references , index terms 
[jr| html(29.72 KB) 

IT is transforming the way governments function and valuable lessons can be learned 
from the pioneering e-government programs that have led the charge. 

73 Information retrieval 2: Text joins in an RDBMS for web data inte g ration 
Luis Gravano, Panagiotis G. Ipeirotis, Nick Koudas, Divesh Srivastava 
May 2003 Proceedings of the 12th international conference on World Wide Web 
Publisher: ACM Press 

Full text available* fSpdf (717.46 KB) Additional Information: full citation , abstract , references , citin gs, index 

terms 

The integration of data produced and collected across autonomous, heterogeneous web 
services is an increasingly important and challenging problem. Due to the lack of global 
identifiers, the same entity (e.g., a product) might have different textual representations 
across databases. Textual data is also often noisy because of transcription errors, 
incomplete information, and lack of standard formats. A fundamental task during data 
integration is matching of strings that refer to the same entity. ... 

Keywords: approximate text matching, data cleaning, text indexing 



74 TIPSTER architecture: TIPSTER text phase II architecture concept Q 
Architecture Committee 

May 1996 Proceedings of a workshop on held at Vienna, Virginia: May 6-8, 1996 

Publisher: Association for Computational Linguistics 

Full text available: 1 ^ pdf(1.28 MB) Additional Information: full citation , abstract 

The TIPSTER Architecture is a software architecture for providing Document Detection 
(i.e. Information Retrieval and Message Routing) and Information Extraction functions to 
text handling applications. The high level architecture is described in an Architecture 
Design Document. In May 1996, when the initial architecture design is complete, an 
Interface Control Document will be provided specifying the form and content of all inputs 
and outputs to the TIPSTER modules. 

75 Understanding users II: A qualitative assessment of the efficacy of UML diagrams as Q 
a form of graphical documentation in aiding program understanding 
Scott Tilley, Shihong Huang 

October 2003 Proceedings of the 21st annual international conference on 
Documentation 

Publisher: ACM Press 

Full text available- IS pdf (274 99 KB) Additional Information: full citation , abstract, references , citings, index 
^ terms 

Graphical documentation is often characterized as an effective aid in program 
understanding. However, it is an open question exactly which types of graphical 
documentation are most suitable for which types of program understanding tasks (and in 
which specific usage contexts). The Unified Modeling Language (UML) is the de facto 
standard for modeling modern software applications. This paper describes an experiment 
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to assess the qualitative efficacy of UML diagrams in aiding program understanding. ... 

Keywords: Unified Modeling Language (UML), assessment, graphical documentation, 
program understanding 



76 Quantifying the effect of user interface design features on cyberstore traffic and sales Q 
Gerald L. Lohse, Peter Spiller 

January 1998 Proceedings of the SIGCHI conference on Human factors in computing 
systems 

Publisher: ACM Press/Addison-Wesley Publishing Co. 

Full text available: ^| pdf(1.07 MB) Additional Information: full citation , references , citings, index terms 




Keywords: Internet retail store design, WWW, economic value, electronic commerce, 
marketing, regression analysis, shopping 



77 Innovation, management & strategy: A web-based consumer-oriented intelli g ent 
decision su p port system for personalized e-services 
Chien-Chih Yu 

March 2004 Proceedings of the 6th international conference on Electronic commerce 
ICEC '04 

Publisher: ACM Press 

Full text available: *^ pdf(479.14 KB ) Additional Information: f ull citation , abstract , references , index terms 

Due to the rapid advancement of electronic commerce and web technologies in recent 
years, the concepts and applications of decision support systems have been significantly 
extended. One quickly emerging research topic is the consumer-oriented decision support 
system that provides functional supports to consumers for efficiently and effectively 
making personalized decisions. In this paper we present an integrated framework for 
developing web-based consumer-oriented intelligent decision support sy ... 

Keywords: decision making process, e-services, intelligent decision support system, 
personalization 




78 Technical session 15: WWW image retrieval: Hierarchical clustering of WWW imag e Q 
search results usin g visual, textual and link information 
Deng Cai, Xiaofei He, Zhiwei Li, Wei-Ying Ma, Ji-Rong Wen 

October 2004 Proceedings of the 12th annual ACM international conference on 
Multimedia 

Publisher: ACM Press 

Full text available: ^pdf(1.15 MB) Additional Information: full citation , abstract , references , index terms 

We consider the problem of clustering Web image search results. Generally, the image 
search results returned by an image search engine contain multiple topics. Organizing the 
results into different semantic clusters facilitates users 1 browsing. In this paper, we 
propose a hierarchical clustering method using visual, textual and link analysis. By using a 
vision-based page segmentation algorithm, a web page is partitioned into blocks, and the 
textual and link information of an image can be accu ... 

Keywords: graph model, image clustering, link analysis, search result organization, 
spectral analysis, vision based page segmentation, web image search 
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79 I/O reference behavior of production database workloads and the TPC benchmarks — jjj 
^ an analysis at the logical level 

^ Windsor W. Hsu, Alan Jay Smith, Honesty C. Young 

March 2001 ACM Transactions on Database Systems (TODS), volume 26 issue i 
Publisher: ACM Press 

Full text available- IS odf(5 42 MB) Additi o n a' Information: full citation , abstract , references , citings, index 
' ^ terms 

As improvements in processor performance continue to far outpace improvements in 
storage performance, I/O is increasingly the bottleneck in computer systems, especially in 
large database systems that manage huge amoungs of data. The key to achieving good 
I/O performance is to thoroughly understand its characteristics. In this article we present 
a comprehensive analysis of the logical I/O reference behavior of the peak 
productiondatabase workloads from ten of the world's largest corporatio ... 

Keywords: I/O, TPC benchmarks, caching, locality, prefetching, production database 
workloads, reference behavior, sequentiality, workload characterization 

80 A model-based approach to simulation composition Q 
Jesse Aronson, Prasanta Bose 

May 1999 Proceedings of the 1999 symposium on Software reusability 
Publisher: ACM Press 

Full text available: 1pl pdf ( 1.25 MB) Additional Information: full citation, references , index terms 



Keywords: component selection, composition, constraints, domain-specific architectural 
model, hierarchical decomposition, simulation 
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81 Selected IR-Related Dissertation Abstracts 

September 1991 ACM SIGIR Forum, volume 25 issue 2 
Publisher: ACM Press 

Full text available: ^| pdf(2.75 MB) Additional Information: full citation , abstract 

The following are citations selected by title and abstract as being related to Information 
Retrieval (IR), resulting from a computer search, using BRS Information Technologies, of 
the Dissertation Abstracts Online database produced by University Microfilms 
International (UMI). Included are UMI order number, title, author, degree, year, 
institution; number of pages, one or more Dissertation Abstracts International (DAI) 
subject descriptors chosen by the author, and abstract. Unless otherwise spec ... 



82 Knowledge sharing , quality, and interm ediation 
Claire Vishik, Andrew B. Whinston 

March 1999 ACM SIGSOFT Software Engineering Notes , Proceedings of the 

international joint conference on Work activities coordination and 
collaboration WACC '99, volume 24 issue 2 
Publisher: ACM Press 

Full text available: ^ pdf(1.33 MB) Additional Information: full citation , abstract , references , index terms 

Informal publishing flourished in the World Wide Web environment, where every user with 
a sufficient level of access can become a publisher. Although it appears that in such an 
environment intermediation in the distribution and sharing of information becomes 
unnecessary, the uneven quality of information and resulting quality uncertainty of 
information users, together with the increased search efforts, represent a sufficient 
reason for information and knowledge intermediaries to preserve and eve ... 

Keywords: Internet, World Wide Web, economics of information, information exchange, 
intermediation, knowledge management 



83 Student tracking and personalization: Dynamic assembly of learnin g objects 
Robert G. Farrell, Soyini D. Liburd, John C. Thomas 

May 2004 Proceedings of the 13th international World Wide Web conference on 
Alternate track papers & posters 

Publisher: ACM Press 

Full text available:^ pdf(307.02 KB) Additional Information: full citation , abstract , references , index terms 
This paper describes one solution to the problem of how to select sequence, and link Web 
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resources into a coherent, focused organization for instruction that addresses a user's 
immediate and focused learning need. A system is described that automatically generates 
individualized learning paths from a repository of XML Web resources. Each Web resource 
has an XML Learning Object Metadata (LOM) description consisting of General, 
Educational, and Classification metadata. Dynamic assembly of these le ... 

Keywords: LOM, RDF, assembly, content management, data retrieval, information 
retrieval, instruction, learning object, linking, metadata, organization, semantic web 



84 Internetworking: coordinating technology for systemic reform 
^ Beverly Hunter 

V May 1993 Communications of the ACM, Volume 36 Issue 5 
Publisher: ACM Press 

Full text available: ^| pdf(6.41 MB) Additional Information: full citation , references , index terms , review 



85 The consumer side of search: Personalized search Q 

# James Pitkow, Hinrich Schutze, Todd Cass, Rob Cooley, Don Turnbull, Andy Edmonds, Eytan 
Adar, Thomas Breuel 

September 2002 Communications of the ACM, volume 45 issue 9 
Publisher: ACM Press 

Full text available: pdf(530.58 KB) Additional Information: full citation , abstract , references , citings , index 
jf| html(41.77 KB) terms 

A contextual computing approach may prove a breakthrough in personalized search 
efficiency. 

86 Hypermedia in the Large: The structure of broad topics on the web Q 
Soumen Chakrabarti, Mukul M. Joshi, Kunal Punera, David M. Pennock 
May 2002 Proceedings of the 11th international conference on World Wide Web 
Publisher: ACM Press 

Full text available- *§| pdf(771 42 KB) Additional Information: full citation , abstract , references , citings , index 
. [a| - terms 

The Web graph is a giant social network whose properties have been measured and 
modeled extensively in recent years. Most such studies concentrate on the graph 
structure alone, and do not consider textual properties of the nodes. Consequently, Web 
communities have been characterized purely in terms of graph structure and not on page 
content. We propose that a topic taxonomy such as Yahoo! or the Open Directory 
provides a useful framework for understanding the structure of content-based clusters ... 

Keywords: social network analysis, web bibliometry 



87 E-commerce and tourism 

Hannes Werthner, Francesco Ricci 

December 2004 Communications of the ACM, volume 47 issue 12 
Publisher: ACM Press 

Full text available: IB pdf(81 .45 KB) AJJ . 4 . Ilf u 4 . * 

g[, 0 . ' Additional Information: full citation , abstract , references , index terms 
[jj] html(26.31 KB) " — 

Travel and tourism are illustrating how e-commerce can change the structure of an 
industry— and in the process create new business opportunities. 
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88 Di gest of proceedings seventh IEEE workshop on hot topics in operatin g systems Q 
^ March 29-30 1999. Rio Rico, AZ 

M. Satyanarayanan 

October 1999 ACM SIGOPS Operating Systems Review, volume 33 issue 4 
Publisher: ACM Press 

Full text available: ^ pdf(1.67 MB) Additional Information: full citation , abstract , index terms 

The Seventh IEEE Workshop on Hot Topics in Operating Systems was held on March 29- 
30 1999 at the Rio Rico Resort & Country Club, south of Tucson, Arizona. The 
General Chair, Peter Druschel, and the Local Arrangements Chair, John Hartman, had 
gone to considerable effort to make the operation of the workshop smooth and pleasant 
for the participants. The secluded desert locale, the effect of brilliant sunshine and blue 
skies on winter-jaded northerners, and the enthusiasm and energy of the ... 

89 Tools & techniques track: browsing and visualizing collections: An initial evaluation of Q 
^ automated or g anization for digital library browsin g 

^ Aaron Krowne, Martin Halbert 

June 2005 Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries 
Publisher: ACM Press 

Full text available: ^) pdf(243.36 KB) Additional Information: full citation , abstract , references , index terms 

In this article we present an evaluation of text clustering and classification methods for 
creating digital library browse interfaces, focusing on the particular case of collections 
made up of heterogeneous metadata records. This situation is common in "portal" style 
digital libraries, which are built by harvesting content from many disparate sources, 
typically using the Open Archives Protocol for Metadata Harvesting (OAI-PMH). By 
studying the activity of users in an experimental system, we find ... 

Keywords: NMF, browsing, categorization, classification, clustering, digital libraries, 
harvesting, portals, taxonomies 



90 Task patterns for user interface desi g n: Modeling patterns for task models 
A. Gaffar, D. Sinnig, A. Seffah, P. Forbrig 

November 2004 Proceedings of the 3rd annual conference on Task models and 
diagrams TAMODIA '04 

Publisher: ACM Press 

Full text available: ^ pdf(1 18.78 KB) Additional Information: full citation , abstract , references , index terms 

Models allow us to describe complex systems at different abstract and conceptual levels, 
hence amplify our analytical and problem solving capabilities, However, a lot of human 
effort and experience is needed to build correct models, and to translate them to concrete 
artifacts: in our case a usable user interface. This paper introduces the concept of task 
and pattern models to leverage the process of task modeling, and show how it can help 
build generic task models, link them, and instantiate the ... 

Keywords: generic pattern types, task models, task patterns, user-centered design 




91 Unpacking the semantics of source and usage to perform semantic reconciliation in Q 
large-scale information systems 
Ken Smith, Leo Obrst 

March 1999 ACM SIGMOD Record, Volume 28 issue l 
Publisher: ACM Press 
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Full text available: ^ pdf(670.04 KB) Additional Information: full citation , abstract , citings , index terms 

Semantic interoperability is a growing challenge in the United States Department of 
Defense (DoD). In this paper, we describe the basis of an infrastructure for the 
reconciliation of relevant, but semantically heterogeneous attribute values. Three types of 
information are described which can be used to infer the context of attributes, making 
explicit hidden semantic conflicts and making it possible to adjust values appropriately. 
Through an extended example, we show how an automated integra ... 

92 Reputation and endorsement for web services 
E. Michael Maximilien, Munindar P. Singh 
December 2001 ACM SIGecom Exchanges, volume 3 issue i 

Publisher: ACM Press 

Full text available* 1Sl odf(70 1 8 KB) Additional Information: full citation , abstract , references , citings , index 
'•^ terms 

The web services set of standards promise the dynamic creation of loosely coupled 
systems, such as those that are required for e-commerce applications. However, current 
approaches for web services lack key functionality, especially to locate, select, and bind 
services meeting certain criteria of quality. We propose an approach wherein software 
agents assist in this task by disseminating reputations and endorsements through a 
specialized agency, which augments the capabilities of current standard ... 

Keywords: e-commerce, software agents, web services 



93 Special section on sensor network technology & sensor data management (part II ): Q 
An initial study of overheads of eddies 

Amol Deshpande 

March 2004 ACM SIGMOD Record, Volume 33 issue l 
Publisher: ACM Press 

Full text available: ^ pdf(95.53 KB) Additional Information: full citation , abstract , references 

An eddy [2] is a highly adaptive query processing operator that continuously reoptimizes 
a query in response to changing runtime conditions. It does this by treating query 
processing as routing of tuples through operators and making per-tuple routing decisions. 
The benefits of such adaptivity can be significant, especially in highly dynamic 
environments such as data streams, sensor query processing, web querying, etc. Various 
parties have asserted that the cost of making per-tuple ... 

94 Poster session: Index compression vs. retrieval time of inverted files for XML Q 
<^ documents 

^ Norbert Fuhr, Norbert Govert 

November 2002 Proceedings of the eleventh international conference on Information 
and knowledge management 

Publisher: ACM Press 

F II text available* 1?) Ddf(55 80 KB) Additional Information: full citation , abstract , references, citings, index 
u e aval a e.-^Q_j__, terms 

Query languages for retrieval of XML documents allow for conditions referring both to the 
content and the structure of documents. In this paper, we investigate two different 
approaches for reducing index space of inverted files for XML documents. First, we 
consider methods for compressing index entries. Second, we develop the new XS tree 
data structure which contains the structural description of a document in a rather 
compact form, such that these descriptions can be kept in main memory. ... 

95 Information retrieval models: Detectin g similar documents using salient terms 
http://portaLacm^ 6/21/06 
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^ James W. Cooper, Anni R. Coden, Eric W. Brown Q 
^ November 2002 Proceedings of the eleventh international conference on Information 
and knowledge management 

Publisher: ACM Press 

Full text available* jg| pdf(180 68 KB) Additional Information: full citation , abstract , references , citings , index 
^ terms 

We describe a system for rapidly determining document similarity among a set of 
documents obtained from an information retrieval (IR) system. We obtain a ranked list of 
the most important terms in each document using a rapid phrase recognizer system. We 
store these in a database and compute document similarity using a simple database 
query. If the number of terms found to not be contained in both documents is less than 
some predetermined threshold compared to the total number of terms in the doc ... 

Keywords: databases, document similarity, duplicate documents, shingles, text mining 



96 Demo session: P2P based systems: MINERVA: collaborative P2P search Q 
Matthias Bender, Sebastian Michel, Peter Triantafillou, Gerhard Weikum, Christian Zimmer 
August 2005 Proceedings of the 31st international conference on Very large data 

bases VLDB '05 
Publisher: VLDB Endowment 

Full text available:^ pdf(234.73 KB) Additional Information: full citation , abstract , references, index terms 

This paper proposes the live demonstration of a prototype of MINERVA, a novel P2P Web 
search engine. The search engine is layered on top of a DHT-based overlay network that 
connects an a-priori unlimited number of peers, each of which maintains a personal local 
database and a local search facility. Each peer posts a small amount of metadata to a 
physically distributed directory that is used to efficiently select promising peers from 
across the peer population that can best locally execute a quer ... 



97 A history of the SNOBOL programming languag es Q 
Ralph E. Griswold 

V January 1978 ACM SIGPLAN Notices , The first ACM SIGPLAN conference on History of 
programming languages HOPL-1, volume 13 issue 8 
Publisher: ACM Press 

Full text available: ^ pdf(3.56 MB) Additional Information: full citation , abstract , references , index terms 

Development of the SNOBOL language began in 1962. It was followed by SNOBOL2, 
SNOBOL3, and SNOBOL4. Except for SNOBOL2 and SNOBOL3 (which were closely 
related), the others differ substantially and hence are more properly considered separate 
languages than versions of one language. In this paper historical emphasis is placed on 
the original language, SNOBOL, although important aspects of the subsequent languages 
are covered. 



98 S pecial section on semantic web and data management: Conceptual model of web Q 
service reputation 

E. Michael Maximilien, Munindar P. Singh 
December 2002 ACM SIGMOD Record, volume 31 issue 4 

Publisher: ACM Press 

Full text available:^ pdf(585.87 KB) Additional Information: full citation , abstract , references , citings 

Current Web services standards enable publishing service descriptions and finding 
services on a match based on criteria such as method signatures or service category. 
However, current approaches provide no basis for selecting a good service or for 
comparing ratings of services. We describe a conceptual model for reputation using which 
reputation information can be organized and shared and service selection can be 
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facilitated and automated. 

Keywords: endorsement, reputation, trust, web services 

99 Web-based educational applications: Online curriculum on the semantic Web: the Q 
^ CSD-UoC portal for peer-to-peer e-learninq 

^ Dimitris Kotzinos, Sofia Pediaditaki, Apostolos Apostolidis, Nikolaos Athanasis, Vassilis 
Christophides 

May 2005 Proceedings of the 14th international conference on World Wide Web 
Publisher: ACM Press 

Full text available: ^ pdf(1.46 MB) Additional Information: full citation , abstract , references , index terms 

Online Curriculum Portals aim to support networks of instructors and learners by 
providing a space of convergence for enhancing peer-to-peer learning interactions among 
individuals of an educational institution. To this end, effective, open and scalable e- 
learning systems are required to acquire, store, and share knowledge under the form of 
learning objects (LO). In this paper, we are interested in exploiting the semantic 
relationships that characterize these LOs (e.g., prerequisite, part-of or ... 

Keywords: IEEE-LOM, e-learning portals, jetspeed portlets, semantic Web 



100 Using remote facilities: Reports on three EDUNET projects: the common user actions Q 
<gy table, SPSS benchmark jobs and interactive benchmark jobs 
^ Elizabeth R. Little 

September 1979 Proceedings of the 7th annual ACM SIGUCCS conference on User 

services 
Publisher: ACM Press 

Full text available: ^| pdf(496.26 KB) Additional Information: full citation , abstract , references 

Use of a national computing network fosters a number of concerns from a variety of levels 
within an institution. Two of the most often voiced concerns are: 1) Can users miles away 
from the host computer really learn enough about the host computer to use it? 2) What 
are the costs? The EDUNET Central office has developed materials, with the cooperation 
of the EDUNET suppliers, designed to answer these questions. This report focuses on 
three specific projects that address using a "foreign" computer ... 
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101 S pecial section on data mining for intrusion detection and thre at anal ysis: ADAM: a Q 
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Intrusion detection systems have traditionally been based on the characterization of an 
attack and the tracking of the activity on the system to see if it matches that 
characterization. Recently, new intrusion detection systems based on data mining are 
making their appearance in the field. This paper describes the design and experiences 
with the ADAM (Audit Data Analysis and Mining) system, which we use as a testbed to 
study how useful data mining techniques can be in intrusion detection. 
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